Power analysis by simulation

Modified

October 17, 2024


# Renv --------------------------------------------------------------------

# Don't forget to use `renv` for a reproducible environment!
# see https://rstudio.github.io/renv/articles/renv.html for more details

# install.packages("renv")  # if you don't have it yet
# library("renv")           # same as above

# renv::init() has already been used to create the renv.lock file (the file that
# contains the exact details of the packages you used) in this template, so now 
# the project only needs to be restored each time you start working. You will
# be asked to install the packages that I added down below upon running this 
# script, but you change this to suit your needs.
renv::restore()


# CmdStanR for Bayesian modelling ------------------------------------------

# The cmdstan backend, if not already installed, has to be installed on your 
# computer first, **outside** of the project:
# install.packages("cmdstanr")
# library("cmdstanr")
# check_cmdstan_toolchain() # check if RTools is setup
# nb_cores <- parallel::detectCores() - 1
# install_cmdstan(cores = nb_cores)

# Now inside the project, you need to run a special install for cmdstanr:
# renv::install("stan-dev/cmdstanr")

# Packages ----------------------------------------------------------------

# pacman allows to check/install/load packages with a single call
# if (!require("pacman")) install.packages("pacman") # already in renv.lock
library("pacman")

# packages to load (and install if needed)
pacman::p_load(
  here,       # easy file paths
  see,        # theme_modern and okabeito palette
  report,     # reporting various info 
  ggbeeswarm, # jittered plots
  scales,     # scales for ggplot2
  Hmisc,      # plot stats
  # ---- Modelling
  faux,      # simulating data
  bayesplot, # plotting for Bayesian models
  brms,      # Bayesian regression models
  rstan,     # Stan interface
  parallel,  # parallel processing
  future,    # parallel with brms
  furrr,     # parallel with purrr
  progressr, # progress bar with furrr
  cmdstanr,  # Stan interface
  emmeans,   # post-hoc tests
  # Should remain last to avoid conflicts with other packages
  quarto,     # quarto reports
  tidyverse   # modern R ecosystem
)

# Global cosmetic theme ---------------------------------------------------

theme_set(theme_modern(base_size = 14)) # from see in easystats
color_scheme_set("green") # for bayesplot

# setting my favourite palettes as ggplot2 defaults
options( 
  ggplot2.discrete.colour   = scale_colour_okabeito,
  ggplot2.discrete.fill     = scale_fill_okabeito,
  ggplot2.continuous.colour = scale_colour_viridis_c,
  ggplot2.continuous.fill   = scale_fill_viridis_c
)

# Fixing a seed for reproducibility ---------------------------------------
set.seed(14051998)


# Adding all packages' citations to a .bib --------------------------------
knitr::write_bib(c(.packages()), file = here("bibliography/packages.bib"))

To perform the power analysis, we will simulate many datasets under different scenarios and fit the hypothesised model on each of them to see how well we can recover the “true” effect. In the previous notebook (00-initial-model.qmd) we detailed the way we built the model at the heart of this process and its characteristics. This core fitting procedure is also gathered in the scripts/00_initial_model.R script, from which we load this initial model in the power analysis script. This model will be updated iteratively. To do so efficiently, we designed a function that simulates data (using the previously defined sim_data function), fits the model and extracts Bayes Factors. We then ran this function many times and using parallel processing with the purrr and furrr packages.

The function
sim_and_fit_bf <- function (
    n_ppts, 
    n_trials, 
    beta_context_task, 
    simulation,
    model,
    p
) {
  # Simulating data
  df <- 
    sim_data(
      n_ppts = n_ppts, 
      n_trials = n_trials, 
      beta_context_task = beta_context_task
    ) |> 
    # summarising data
    group_by(participant, context, task) |>
    mutate(n_trials = n()) |> 
    reframe(
      nb_da = sum(response == 1),
      n_trials = unique(n_trials)
    ) |> 
    # defining the sum contrasts
    define_contrasts("context", c(-0.5, +0.5)) |>
    define_contrasts("task", c(+0.5, -0.5))
  
  # Fitting the model
  updated_model <- update(
    object = model,
    newdata = df,
    chains = n_cores,
    cores  = n_cores,
    iter = n_iter,
    silent = 2,
    refresh = 0
  )
  
  # Extracting Bayes factors
  hyp_two_sided <- hypothesis(updated_model, "context1:task1 = 0")
  hyp_one_sided <- hypothesis(updated_model, "context1:task1 > 0")
  
  # retrieving the two-sided BF10 (1/BF01) and the one-sided BF10
  bf10 <- 1 / as.numeric(hyp_two_sided$hypothesis$Evid.Ratio)
  bfdir <- as.numeric(hyp_one_sided$hypothesis$Evid.Ratio)
  
  # updating the `progressr` progress bar
  p()
  
  # returning the BFs
  return (c(bf10, bfdir))
}

We first set some parameters to explore:

We then create a grid of parameters and associated datasets:

# number of participants
n_ppts <- seq(40, 70, 10)

# number of trials per participant
n_trials <- seq(110, 160, 10)

# effect of the interaction
beta_context_task <- seq(0.06, 0.13, 0.01)

# number of simulations for each combination
n_sims <- 100

# creating a grid of parameters and associated datasets
combinations <- crossing(
  n_ppts = n_ppts, 
  n_trials = n_trials, 
  beta_context_task = beta_context_task,
  simulation = 1:n_sims
)

We then run the simulations in parallel:

Parallel processing setup
# We will fit as many models as possible in parallel. If we allow ourselves to 
# use all the cores minus one, the number of models will be 
# floor(n_available / n_cores), n_cores being the number of cores used for each 
# model ("floor" rounds the number down to avoid using one excess core).

# detecting the number of cores available
n_available <- parallel::detectCores() - 1
# number of cores/chains for each model
n_cores <- 2
# number of models to fit in parallel
n_models <- floor(n_available / n_cores)

# recruiting the workers (cores) for parallel processing
plan(multisession, workers = n_models)

# defining the number of iterations per chain based on cores per model (+ 1000 
# warm-up iterations)
n_iter <- ceiling(40000 / n_cores) + 1000
# recording the time at which the simulation started
start <- proc.time()[3]

# running the simulations in parallel
with_progress({
  p <- progressor(steps = nrow(combinations))
  simulation_results <- 
    future_pmap(
      .l = combinations, 
      .f = sim_and_fit_bf, 
      model = initial_model,
      p = p, # to pass to .f for progressr
      .options = furrr_options(seed = TRUE)
    )
})

simulation_results <-  
  bind_cols(combinations, tibble(bf = simulation_results)) |> 
  unnest_wider(col = bf, names_sep = "_") |> 
  na.omit()

# calculating total computation time
total_time <- 
  (proc.time()[3] - start) |> 
  round(digits = 2) |> 
  seconds_to_period()

cat(paste0(
  "The entire power analysis took ", total_time, ".\n",
  "-------------------------------------------------------------------------",
  "\n"
))

The power analysis by simulation is computationally intensive, so it has been done beforehand in a separate script, scripts/01_power_analysis.R. The power analysis presented in the Preregistered Direct Replication manuscript required 19 200 simulations and re-fits of the initial model and took 1 day, 8 hours and 18 minutes. The results of these simulations are saved in the data/r-data-structures/power-analyses-results-240702.rds file. We will load these results and present them in the following sections. The all the code in this notebook is displayed for reference, but it won’t be executed upon rendering it.

simulation_results <- readRDS("data/r-data-structures/power-analyses-results-240702.rds")

We compute the power from the BFs obtained as the proportion of BFs greater than 10. We also calculate the standard error of the power estimate. We then plot the power as a function of the effect size for each combination of the number of participants, number of trials, and effect size.

Computing power from the simulation results
bf_threshold <- 10

df_power <- 
  simulation_results |> 
  group_by(n_ppts, n_trials, beta_context_condition) |> 
  reframe(
    # power for BF10
    bf_10_attained = sum(bf_1 >= bf_threshold),
    bf_10_power = bf_10_attained / n(),
    bf_10_se = sqrt(bf_10_power * (1 - bf_10_power) / n()),
    # power for BF+
    bf_dir_attained = sum(bf_2 >= bf_threshold),
    bf_dir_power = bf_dir_attained / n(),
    bf_dir_se = sqrt(bf_dir_power * (1 - bf_dir_power) / n()),
  ) |> 
  mutate(across(c(beta_context_condition), factor)) |> 
  mutate(
    n_ppts = 
      paste(n_ppts, "participants") |> 
      factor(),
    n_trials = 
      paste(n_trials, "trials") |> 
      factor()
    )
Plotting the power analysis results
df_power |> 
  ggplot(aes(
    x = beta_context_condition,
    y = bf_dir_power,
    color = n_trials,
    fill = n_trials,
    group = n_trials)
  ) +
  geom_hline(
    yintercept = 0.9, 
    linetype = 3, 
    alpha = 0.8, 
    linewidth = 0.7,
    color = "red"
    ) +
  geom_line(linewidth = 0.5) +
  geom_point(
    pch = 21, 
    color = "white",
    show.legend = FALSE,
    size = 3
  ) +
  geom_errorbar(
    aes(
      ymin = bf_dir_power - bf_dir_se,
      ymax = bf_dir_power + bf_dir_se),
    width = 0,
    linewidth = .5,
    show.legend = FALSE
  ) +
  # geom_smooth(
  #   method = "lm", 
  #   se = FALSE,
  #   na.rm = TRUE,
  #   linewidth = 0.5, 
  #   alpha = 0.1, 
  #   fullrange = TRUE, 
  #   show.legend = FALSE
  # ) +
  # annotate(
  #   geom = "text",
  #   label = "90% power",
  #   size = 4,
  #   color = "black",
  #   x = -Inf,
  #   y = 0.9,
  #   hjust = -0.1,
  #   vjust = -1
  # ) +
  labs(
    title = "A priori power analysis (100 simulations per dot, 19200 total)",
    x = "Interaction effect size",
    y = "Statistical power (proportion of BF+ >= 10)",
    color = "Number of\ntrials"
  ) +
  facet_wrap(vars(n_ppts), nrow = 2) +
  scale_y_continuous(
    limits = c(0, 1),
    breaks = seq(0, 1, 0.2), 
    expand = expansion(c(0, 0.05))) +
  scale_color_viridis_d() + 
  scale_fill_viridis_d() + 
  theme(
    panel.grid.major = element_line(colour = "grey90", linewidth = 0.5),
    panel.grid.minor.y = element_line(colour = "grey95", linewidth = 0.4),
    # panel.background = element_rect(colour = "black"),
    panel.border = element_rect(fill = NA, colour = "black"),
    axis.line = element_blank()
  )

This analysis allows us to make an informed decision about the number of participants and trials needed to achieve a certain level of power. We needed to find a balance between the number of participants and the number of trials to keep the experiment feasible while ensuring a high level of power. We chose to opt for 60 participants and 140 trials per participant to achieve a power of 90% for an effect size of at least 0.11, which corresponds to a proportion difference around 2.8%. This is more than half the effect size we observed in the original study, which was around 6%, so we concluded that our analysis was conservative enough for our purposes.

     

═════════════════════════════════════════════════════════════════════════
Analyses were conducted using the R Statistical language (version 4.4.1; R Core
Team, 2024) on Windows 11 x64 (build 22631)
Packages used:
  - quarto (version 1.4.4; Allaire J, Dervieux C, 2024)
  - future (version 1.34.0; Bengtsson H, 2021)
  - progressr (version 0.14.0; Bengtsson H, 2023)
  - brms (version 2.22.0; Bürkner P, 2017)
  - ggbeeswarm (version 0.7.2; Clarke E et al., 2023)
  - faux (version 1.2.1; DeBruine L, 2023)
  - Rcpp (version 1.0.13; Eddelbuettel D et al., 2024)
  - cmdstanr (version 0.8.1.9000; Gabry J et al., 2024)
  - bayesplot (version 1.11.1; Gabry J, Mahr T, 2024)
  - lubridate (version 1.9.3; Grolemund G, Wickham H, 2011)
  - Hmisc (version 5.1.3; Harrell Jr F, 2024)
  - emmeans (version 1.10.5; Lenth R, 2024)
  - see (version 0.9.0; Lüdecke D et al., 2021)
  - report (version 0.5.9; Makowski D et al., 2023)
  - here (version 1.0.1; Müller K, 2020)
  - tibble (version 3.2.1; Müller K, Wickham H, 2023)
  - R (version 4.4.1; R Core Team, 2024)
  - pacman (version 0.5.1; Rinker TW, Kurkiewicz D, 2018)
  - StanHeaders (version 2.32.10; Stan Development Team, 2020)
  - rstan (version 2.32.6; Stan Development Team, 2024)
  - furrr (version 0.3.1; Vaughan D, Dancho M, 2022)
  - ggplot2 (version 3.5.1; Wickham H, 2016)
  - forcats (version 1.0.0; Wickham H, 2023)
  - stringr (version 1.5.1; Wickham H, 2023)
  - tidyverse (version 2.0.0; Wickham H et al., 2019)
  - dplyr (version 1.1.4; Wickham H et al., 2023)
  - purrr (version 1.0.2; Wickham H, Henry L, 2023)
  - readr (version 2.1.5; Wickham H et al., 2024)
  - scales (version 1.3.0; Wickham H et al., 2023)
  - tidyr (version 1.3.1; Wickham H et al., 2024)
═════════════════════════════════════════════════════════════════════════